Generating Annotated Corpora for Reading Comprehension and Question Answering Evaluation

نویسندگان

  • Tiphaine Dalmas
  • Jochen L. Leidner
  • Bonnie Webber
  • Claire Grover
  • Johan Bos
چکیده

Recently, reading comprehension tests for students and adult language learners have received increased attention within the NLP community as a means to develop and evaluate robust question answering (NLQA) methods. We present our ongoing work on automatically creating richly annotated corpus resources for NLQA and on comparing automatic methods for answering questions against this data set. Starting with the CBC4Kids corpus, we have added XML annotation layers for tokenization, lemmatization, stemming, semantic classes, POS tags and bestranking syntactic parses to support future experiments with semantic answer retrieval and inference. Using this resource, we have calculated a baseline for word-overlap based answer retrieval (Hirschman et al., 1999) on the CBC4Kids data and found the method performs slightly better than on the REMEDIA corpus. We hope that our richly annotated version of the CBC4Kids corpus will become a standard resource, especially as a controlled environment for evaluating inference-based techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

Improving Question Answering for Reading Comprehension Tests by Combining Multiple Systems

Most work on reading comprehension question answering systems has focused on improving performance by adding complex natural language processing (NLP) components to such systems rather than by combining the output of multiple systems. Our paper empirically evaluates whether combining the outputs of seven such systems submitted as the final projects for a graduate level class can improve over th...

متن کامل

Effects of reading strategies on comprehension and metacomprehension

Two experiments were conducted to find effective reading strategies. Generating keywords, summarizing, answering questions, and rereading strategies were tested in Experiment 1. Answering questions yielded the best performance both in comprehension and the accuracy of judgment of learning(JOL). In Experiment 2, four reading strategies were tested to find the key aspects of answering question st...

متن کامل

Interlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation

This paper focuses on the next step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to provide parallel corpora annotated with detailed deep ...

متن کامل

Evaluating Machine Reading Systems through Comprehension Tests

This paper describes a methodology for testing and evaluating the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. The methodology is being used in QA4MRE (QA for Machine Reading Evaluation), one of the labs of CLEF. We report here the conclusions and lessons learned after the first campaign in 2011.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003